Lo mejor de dos idiomas - Cross-Lingual Linkage of Geotagged Wikipedia Articles
نویسنده
چکیده
Different language versions of Wikipedia contain articles referencing the same place. However, an article in one language does not necessarily mean it is available in another language as well and linked to. This paper examines geotagged articles describing places in Honduras in both the Spanish and the English language versions. It demonstrates that a method based on simple features can reliably identify article pairs describing the same semantic place concept and evaluates it against the existing interlinks as well as a manual assessment.
منابع مشابه
On finding cross-lingual article pairs
Finding a Wikipedia article in another language is often achievable with the in-built interlanguage links. We explore the possibility to automatically generate these links for geotagged articles as an application of entity resolution on an article level. It has the potential to improve Wikipedia, but also allows to use a well-curated ground truth for the merging algorithm. The resolution is bas...
متن کاملTowards Cross-lingual Patent Wikification
This paper demonstrates the effectiveness of cross-lingual patent wikification, which links technical terms in a patent application document to their corresponding Wikipedia articles in different languages. The number of links increases definitely because different language versions of Wikipedia cover different sets of technical terms. We present an experiment of Japanese-to-English cross-lingu...
متن کاملUntangling the Cross-Lingual Link Structure of Wikipedia
Wikipedia articles in different languages are connected by interwiki links that are increasingly being recognized as a valuable source of cross-lingual information. Unfortunately, large numbers of links are imprecise or simply wrong. In this paper, techniques to detect such problems are identified. We formalize their removal as an optimization task based on graph repair operations. We then pres...
متن کاملA Comparison of Approaches for Measuring Cross-Lingual Similarity of Wikipedia Articles
Wikipedia has been used as a source of comparable texts for a range of tasks, such as Statistical Machine Translation and CrossLanguage Information Retrieval. Articles written in different languages on the same topic are often connected through inter-language-links. However, the extent to which these articles are similar is highly variable and this may impact on the use of Wikipedia as a compar...
متن کاملDocument Categorization using Multilingual Associative Networks based on Wikipedia
Associative networks are a connectionist language model with the ability to categorize large sets of documents. In this research we combine monolingual associative networks based on Wikipedia to create a larger, multilingual associative network, using the cross-lingual connections between Wikipedia articles. We prove that such multilingual associative networks perform better than monolingual as...
متن کامل